首页> 外文OA文献 >Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models
【2h】

Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models

机译:利用中端DNA模式进行序列分类:二进制抽象Markov模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5′-untranslated regions.
机译:Messenger RNA序列具有特定的核苷酸模式,可将其与非编码基因组序列区分开。在这项研究中,我们探索利用改进的马尔可夫模型分析高达44 bp的序列,以远远超过传统马尔可夫模型的8 bp限制,以进行外显子/内含子识别。为了分析该长度的核苷酸序列,首先通过应用多种抽象方案将其转换为较短的二进制模式,以减少其信息含量。在将基因组序列转换为二进制字符串后,使用在二进制序列上训练的同质马尔可夫模型来区分外显子和内含子。我们称这种方法为二进制抽象马尔可夫模型(BAMM)。使用超级计算机上的优化算法选择用于外显子/内含子鉴别的高质量抽象方案。然后,使用支持向量机将最佳的MM分类器组合为一个分类器。通过这种方法,无需考虑阅读框即可实现超过95%的分类精度。随着进一步的发展,BAMM方法可以应用于缺乏遗传密码的序列,例如ncRNA和5'-非翻译区。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号